multi-target regression
Volume-Sorted Prediction Set: Efficient Conformal Prediction for Multi-Target Regression
We introduce Volume-Sorted Prediction Set ( VSPS), a novel method for uncertainty quantification in multi-target regression that uses conditional normalizing flows with conformal calibration. This approach constructs flexible, non-convex predictive regions with guaranteed coverage probabilities, overcoming limitations of traditional methods. By learning a transformation where the conditional distribution of responses follows a known form, VSPS identifies dense regions in the original space using the Jacobian determinant. This enables the creation of prediction regions that adapt to the true underlying distribution, focusing on areas of high probability density. Experimental results demonstrate that VSPS produces smaller, more informative prediction regions while maintaining robust coverage guarantees, enhancing uncertainty modeling in complex, high-dimensional settings. Introduction In real-world applications, it is often required to estimate more than one response variable [1, 2, 3]. Consider, for example, estimating the effects and side effects of a drug given the patient's demographic information and medical measurements.
Local Interpretability of Random Forests for Multi-Target Regression
Bardos, Avraam, Mylonas, Nikolaos, Mollas, Ioannis, Tsoumakas, Grigorios
Multi-target regression is useful in a plethora of applications. Although random forest models perform well in these tasks, they are often difficult to interpret. Interpretability is crucial in machine learning, especially when it can directly impact human well-being. Although model-agnostic techniques exist for multi-target regression, specific techniques tailored to random forest models are not available. To address this issue, we propose a technique that provides rule-based interpretations for instances made by a random forest model for multi-target regression, influenced by a recent model-specific technique for random forest interpretability. The proposed technique was evaluated through extensive experiments and shown to offer competitive interpretations compared to state-of-the-art techniques.
Introducing a Multi-target regression
In this review, I will review a well-known paper called Multi-target regression via input space expansion: treating targets as inputs [1]. Note that I am introducing Grigorios Tsoumakas, et al work, and they wrote the paper. This review is simply an introduction and summary of their original work. I will try to explain this paper simply with related examples. MTR or Multi-output regression or Multivariate regression refers to the models that try to predict multivariate outputs with the related input data.
Copula-based conformal prediction for Multi-Target Regression
Messoudi, Soundouss, Destercke, Sébastien, Rousseau, Sylvain
The most common supervised task in machine learning is to learn a single-task, single-output prediction model. However, such a setting can be ill-adapted to some problems and applications. On the one hand, producing a single output can be undesirable when data is scarce and when producing reliable, possibly set-valued predictions is important (for instance in the medical domain where examples are very hard to collect for specific targets, and where predictions are used for critical decisions). Such an issue can be solved by using conformal prediction approaches [1]. It was initially proposed as a transductive online learning approach to provide set predictions (in the classification case) or interval predictions (in the case of regression) with a statistical guarantee depending on the probability of error tolerated by the user, but was then extended to handle inductive processes [2]. On the other hand, there are many situations where there are multiple, possibly correlated output variables to predict at once, and it is then natural to try to leverage such correlations to improve predictions. Such learning tasks are commonly called Multi-task in the literature [3]. Most research work on conformal prediction for multi-task learning focuses on the problem of multi-label prediction [4, 5], where each task is a binary classification one. Conformal prediction for multi-target regression has been less explored, with only a few studies dealing with it: Kuleshov et al. [6] provide a theoretical framework to use conformal predictors within manifold (e.g., to provide a mono-dimensional embedding of the multi-variate output), while Neeven and Smirnov [7] use a straightforward multi-target extension of a conformal single-output k-nearest neighbor regressor [8] to provide weather forecasts.
Deep tree-ensembles for multi-output prediction
Nakano, Felipe Kenji, Pliakos, Konstantinos, Vens, Celine
Recently, deep neural networks have expanded the state-of-art in various scientific fields and provided solutions to long standing problems across multiple application domains. Nevertheless, they also suffer from weaknesses since their optimal performance depends on massive amounts of training data and the tuning of an extended number of parameters. As a countermeasure, some deep-forest methods have been recently proposed, as efficient and low-scale solutions. Despite that, these approaches simply employ label classification probabilities as induced features and primarily focus on traditional classification and regression tasks, leaving multi-output prediction under-explored. Moreover, recent work has demonstrated that tree-embeddings are highly representative, especially in structured output prediction. In this direction, we propose a novel deep tree-ensemble (DTE) model, where every layer enriches the original feature set with a representation learning component based on tree-embeddings. In this paper, we specifically focus on two structured output prediction tasks, namely multi-label classification and multi-target regression. We conducted experiments using multiple benchmark datasets and the obtained results confirm that our method provides superior results to state-of-the-art methods in both tasks.
Deep Hurdle Networks for Zero-Inflated Multi-Target Regression: Application to Multiple Species Abundance Estimation
Kong, Shufeng, Bai, Junwen, Lee, Jae Hee, Chen, Di, Allyn, Andrew, Stuart, Michelle, Pinsky, Malin, Mills, Katherine, Gomes, Carla P.
A key problem in computational sustainability is to understand the distribution of species across landscapes over time. This question gives rise to challenging large-scale prediction problems since (i) hundreds of species have to be simultaneously modeled and (ii) the survey data are usually inflated with zeros due to the absence of species for a large number of sites. The problem of tackling both issues simultaneously, which we refer to as the zero-inflated multi-target regression problem, has not been addressed by previous methods in statistics and machine learning. In this paper, we propose a novel deep model for the zero-inflated multi-target regression problem. To this end, we first model the joint distribution of multiple response variables as a multivariate probit model and then couple the positive outcomes with a multivariate log-normal distribution. By penalizing the difference between the two distributions' covariance matrices, a link between both distributions is established. The whole model is cast as an end-to-end learning framework and we provide an efficient learning algorithm for our model that can be fully implemented on GPUs. We show that our model outperforms the existing state-of-the-art baselines on two challenging real-world species distribution datasets concerning bird and fish populations.
Feature Ranking for Semi-supervised Learning
Petković, Matej, Džeroski, Sašo, Kocev, Dragi
The data made available for analysis are becoming more and more complex along several directions: high dimensionality, number of examples and the amount of labels per example. This poses a variety of challenges for the existing machine learning methods: coping with dataset with a large number of examples that are described in a high-dimensional space and not all examples have labels provided. For example, when investigating the toxicity of chemical compounds there are a lot of compounds available, that can be described with information rich high-dimensional representations, but not all of the compounds have information on their toxicity. To address these challenges, we propose semi-supervised learning of feature ranking. The feature rankings are learned in the context of classification and regression as well as in the context of structured output prediction (multi-label classification, hierarchical multi-label classification and multi-target regression). To the best of our knowledge, this is the first work that treats the task of feature ranking within the semi-supervised structured output prediction context. More specifically, we propose two approaches that are based on tree ensembles and the Relief family of algorithms. The extensive evaluation across 38 benchmark datasets reveals the following: Random Forests perform the best for the classification-like tasks, while for the regression-like tasks Extra-PCTs perform the best, Random Forests are the most efficient method considering induction times across all tasks, and semi-supervised feature rankings outperform their supervised counterpart across a majority of the datasets from the different tasks.
Multi-target regression via output space quantization
Spyromitros-Xioufis, Eleftherios, Sechidis, Konstantinos, Vlahavas, Ioannis
Multi-target regression is concerned with the prediction of multiple continuous target variables using a shared set of predictors. Two key challenges in multi-target regression are: (a) modelling target dependencies and (b) scalability to large output spaces. In this paper, a new multi-target regression method is proposed that tries to jointly address these challenges via a novel problem transformation approach. The proposed method, called MRQ, is based on the idea of quantizing the output space in order to transform the multiple continuous targets into one or more discrete ones. Learning on the transformed output space naturally enables modeling of target dependencies while the quantization strategy can be flexibly parameterized to control the trade-off between prediction accuracy and computational efficiency. Experiments on a large collection of benchmark datasets show that MRQ is both highly scalable and also competitive with the state-of-the-art in terms of accuracy. In particular, an ensemble version of MRQ obtains the best overall accuracy, while being an order of magnitude faster than the runner up method.
Online Multi-target regression trees with stacked leaf models
Mastelini, Saulo Martiello, Barbon, Sylvio Jr., de Carvalho, André Carlos Ponce de Leon Ferreira
The amount of available data raises at large steps. Developing machine learning strategies to cope with the high throughput and changing data streams is a scope of high relevance. Among the prediction tasks in online machine learning, multi-target regression has gained increased attention due to its high applicability and relation with real-world problems. While reliable and effective solutions have been proposed for batch multi-target regression, the few existing solutions in the online scenario present gaps which should be further investigated. Among these problems, none of the existing solutions consider the occurrence of inter-target correlations when making predictions. In this work, we propose an extension to existing decision tree based solutions in online multi-target regression which tackles the problem mentioned above. Our proposal, called Stacked Single-target Hoeffding Tree (SST-HT) uses the inter-target dependencies as an additional information source to enhance accuracy. Throughout an extensive experimental setup, we evaluate our proposal against state-of-the-art decision tree-based solutions for online multi-target regression tasks on sixteen datasets. Our observations show that SST-HT is capable of achieving significantly smaller errors than the other methods, whereas only increasing the needed time and memory requirements in small amounts.
Machine learning for predicting thermal power consumption of the Mars Express Spacecraft
Petković, Matej, Boumghar, Redouane, Breskvar, Martin, Džeroski, Sašo, Kocev, Dragi, Levatić, Jurica, Lucas, Luke, Osojnik, Aljaž, Ženko, Bernard, Simidjievski, Nikola
The thermal subsystem of the Mars Express (MEX) spacecraft keeps the on-board equipment within its pre-defined operating temperatures range. To plan and optimize the scientific operations of MEX, its operators need to estimate in advance, as accurately as possible, the power consumption of the thermal subsystem. The remaining power can then be allocated for scientific purposes. We present a machine learning pipeline for efficiently constructing accurate predictive models for predicting the power of the thermal subsystem on board MEX. In particular, we employ state-of-the-art feature engineering approaches for transforming raw telemetry data, in turn used for constructing accurate models with different state-of-the-art machine learning methods. We show that the proposed pipeline considerably improve our previous (competition-winning) work in terms of time efficiency and predictive performance. Moreover, while achieving superior predictive performance, the constructed models also provide important insight into the spacecraft's behavior, allowing for further analyses and optimal planning of MEX's operation.